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Using and Evaluating Differential Modeling in 
Intelligent Tutoring and Apprentice Learning Systems 

David C. Wilkins, William J. Clancey and Bruce G. Buchanan 

Knowledge Systems Laboratory 
Department of Computer Science 
Stanford University 
Stanford, CA 94305 

Abstract 

A powerful approach to debugging and refining the knowledge structures of a 
problem-solving agent is to differentially model the actions of the agent against 
a gold standard. This paper proposes a framework for exploring the inherent limi¬ 
tations of such an approach when a problem solver is differentially modeled against 
an expert system. A procedure is described for determining a performance upper 
bound for debugging via differential modeling, called the synthetic agent method. 
The synthetic agent method systematiceilly explores the space of near miss training 
instances and expresses the limits of debugging in terms of the knowledge represen¬ 
tation and control language constructs of the expert system. 

1 Introduction 

Artificial Intelligence has long been interested in methods to automatically refine 
and debug an intelligent agent. This is a central concern in machine learning and 
automatic programming, where the agent to be improved is a program. It is also 
a central concern in intelligent tutoring, where the agent to be improved is a hu¬ 
man problem solver. Many AI systems for improving an intelligent agent involve 







difFerential modeling of the agent against the observable problem-solving behavior 
of another agent. We focus on the situation where one of the agents is a knowledge- 
based expert system and the knowledge structures to be improved encode factual 
information that is declaratively represented^. 

This paper describes the synthetic agent method, which allows calculation of 
a performance upper bound on improvement to an intelligent agent attainable by 
differential modeling of the agent against an expert system. A performance up¬ 
per bound identifies missing or erroneous knowledge in an intelligent agent that a 
particular differential modeling system is inherently incapable of identifying. By 
contrast, most performance evaluation procedures aim to determine a performance 
lower bound; they experimentally demonstrate that a particular difFerential model¬ 
ing system can successfully identify some missing or erroneous knowledge. 

The synthetic agent method involves replacing the human problem solver in a 
difFerential modeling scenario with a synthetic agent that is another expert system. 
The knowledge in the synthetic agent expert system is systematically modified to be 
slightly different than the knowledge in the original expert system. The knowledge 
in the synthetic agent is modified to be slightly ‘better’ in an apprenticeship learning 
scenario and slightly ‘worse’ in an intelligent tutoring scenario. 

This paper is organized as follows. Section 2 surveys previous and current 
work on improving an intelligent agent via difFerential modeling. Section 3 identi¬ 
fies important performance evaluation issues related to evaluation of a difFerential 
modeler. Section 4 presents and discusses the synthetic agent method. Finally, 
Section 5 describes an application of the synthetic agent method that is currently 
underway. 

This paper presents our framework for evaluating a difFerential modeling sys¬ 
tem. No experimental results are given. A future paper will describe the use of the 
framework to evaluate the ODYSSEUS modeling program (described in Section 5) in 
the context of intelligent tutoring and apprenticeship learning. 

'As much domain-specific knowledge as possible is declaratively represented in a well designed 
knowledge-intensive expert system. Domain-specific procedural knowledge is contained in an expert 
system shell for the generic problem class (Clancey, 1984). 






2 The Process of Differential Modeling 


Many AI systems that debug and refine zin intelligent agent employ a method called 
differential modeling; this is the process of identifying differences between the ob¬ 
served behavior of a problem-solving agent and the behavior that would be expected 
in accordance with an explicit model of problem solving. 


Statement of Problem 



Answer 


Knowledge Differences 
Between PS and ES 


Answer 


Figure 1: A general model of the differential modeling process. 
PS solves a problem, and DM finds differences between knowledge 
structures of PS and ES. In this paper, equal attention is given to 
the situation of apprenticeship learning where PS is a human ex¬ 
pert and the goal is to improve ES; and the situation of intelligent 
tutoring where PS is a student and the goal is to improve PS. 


The differential modeling process is illustrated in Figure 1. The three major 
elements are a problem solver (PS), a differential modeler (DM), and a knowledge- 
based expert system (ES). The teisk of the DM is to identify differences between the 
knowledge structures of PS and ES in the course of watching PS solve a problem, for 
ex 2 imple a medical diagnosis problem. In the figure, Answer consists of all observable 
behavior of the respective problem solver. The DM can be quite complex and can 
e^lsily exceed the complexity of the ES. 










Two major tasks that confront a DM are global and local credit assignment, 
which axe performed by a global and loc^ll learning critic, respectively. The global 
critic determines when the observable behavior of PS suggests a difference between 
the knowledge structures of ES and PS. In such a situation, the local critic is 
summoned to identify possible knowledge differences between ES and PS that axe 
suggested by the actions of PS. A complete learning system consists of a global critic, 
local critic and a repair component (Dietterich and Buchanan, 1981); discussion of 
the repair stage is beyond the scope of this paper. 


2.1 Previous work in differential modeling 


AI systems that employ a differential modeling approach to debugging and refining 
a problem-solving agent are found in the areas of machine learning, automatic 
programming, and intelligent tutoring. We first describe systems that do not employ 
a knowledge-based expert system as the explicit model of problem solving and then 
describe systems that do. 

The earliest such systems were in the rea of machine learning, notably, 
Samuel’s checker player and Waterman’s poker player (Samuel, 1963; Watermem, 
1970). The PS used by Samuel’s DM program was a book of championship checker 
games. The DM global critic task was accomplished by comparing the move of PS 
to the move that Samuel’s program made in the same situation. The local critic task 
was accomplished by adjusting the coefficients of a polynomial evaluation function 
for selecting moves so that the action of the program equaled the action of PS. A 
recent example of machine learning reseaxch that uses a differential modeling ap¬ 
proach is the PRE system for theory-directed data interpretation (Dietterich, 1984). 
PRE learns programs for Unix commands from examples of the use of the commands. 
The DM employs constraint propagation to identify differences between the PS and 
the programs for commands. 

In automatic programming, the synthesis of LISP and PROLOG functions from 
example traces falls under the rubric of debugging via differential modeling (Bier- 
mann, 1978; Shapiro, 1983). The PS consists of the input/output behavior of a 





correct program. The DM modifies the program being synthesized whenever it 
does not give the same output as PS when given the same input. 


In intelligent tutoring the goal is to ‘debug’ a human problem solver. Many 
intelligent tutoring systems contain an expert system and use a differential model¬ 
ing technique, including the WEST program in the domain of games (Burton and 
Brown, 19S2), SOPHIE III and GUIDON in the domain of diagnosis (Brown et ah, 
19S2: Clancey, 1979), and the MACSY.MA-ADVTSOR in the domain of symbolic inte¬ 
gration (Genesereth, 19S2). SOPHIE III uses an expert system for circuit diagnosis 
as an aid in isolating hypothesis errors in the behavior of students who are perform¬ 
ing electronic troubleshooting. GUIDON is built over the MYCIN expert system for 
medical diagnosis (Buchanan and Shortliffe, 1984); student hypothesis errors are 
discovered in the process of conducting a Socratic dialogue. 

Recent research within machine learning also uses an expert system as the 
explicit model of problem solving, especially within the subarea of apprenticeship 
learning. Apprenticeship learning is defined as a form of learning that occurs in 
the context of normal problem solving and uses underlying theories of the problem 
solving domain to accomplish learning. Examples of apprenticeship learning systems 
are LEAP and ODYSSEUS. The LEAP program refines knowledge bases for the VEXED 
expert system for VLSI circuit design (Mitchell et ah, 1985). PS is a circuit designer 
who is using the VEXED circuit design aid and the underlying theory used by the DM 
is circuit theory. ODYSSEUS refines and debugs knowledge bases for the HERACLES 
expert system shell, which solves problems using the heuristic classification method 
(Wilkins, 198G). When the ODYSSEUS problem domain is medical diagnosis, PS 
is a physician diagnosing a patient. The DM uses two underlying theories, the 
principal one being a strategy theory of the problem-solving method. ODYSSEUS is 
also applicable to intelligent tutoring; it functions as a student modeling program 
for the GUID0N2 intelligent tutoring system (Clancew, 1986a). 
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2.2 Assumptions and issues in differential modeling 


! Much of the power of an expert system derives from the quantity and quality of its 

J domain-specific knowledge. For the purposes of this paper, the principal function 

I of the differential modeler is to find factual domain knowledge differences between 

the problem solver and the expert system. Our work assumes that the expert sys- 
! tern represents domain knowledge declaratively, including domain-specific control 

■ knowledge. Further, as much as possible, the knowledge is represented indepen- 

I dently of how it will be used by problem-solving programs. This practice facilitates 

I use of the same domain knowledge for different purposes, such as problem solving, 

explanation, tutoring and learning. 

The framework provided by this paper for understanding the limits of de¬ 
bugging via differential modeling has been fashioned with the following assump¬ 
tions in mind. First, we assume that an agent is differentially modeled against a 
knowledge-based expert system that is capable of solving the problems presented 
to the human PS. Second, we assume that the observed actions of the agent consist 
of normal problem-solving behavior in a domain. And third, we assume that the 
goal of the differential modeling system is to discover factual domain knowledge 
differences between the agent and the expert system’s knowledge base, as opposed 
to the discovery of procedural control knowledge differences; procedural knowledge 
involves sequencing constructs such as looping and recursion. 

There are many open questions regarding debugging via differential modeling 
against an expert system. For instance, what are the types of knowledge in the PS 
that can and cannot be debugged using a differential modeling approach? What 
characteristics and organization of an ES facilitate differential modeling? What 
characteristics and organization impose inherent limitations? How can the strengths 
and weaknesses of a particular DM be best described? The evaluation methodology 
proposed in this paper, called the synthetic agent method, provides a framework 
for the exploration of these questions. 














3 Performance Evaluation Issues 


DM performance evaluation is intimately related to ES performance evaluation. The 
function of a DM is to improve the performance of an ES and so DM performance 
evaluation requires ES performance evaluation. Although ES evaluation is a difficult 
and time consuming task, there is agreement on the general approach that should be 
taken when ES is an expert system program. Examnles of performance evaluation 
studies based on a sound methodology are the evaluations of the MYCIN, INTERNIST 
and RL expert systems (Yu et ah, 1979; Miller et ah, 1984; Fu and Buchanan, 1985). 

Two major functions of a DM are global and local credit assignment^. The 
general problem of assessing the limits of a DM consists of finding performance 
upper bounds on a DM’s global and local critics. The difficulty of these functions 
is very domain dependent. In the domain used to develop repair theory, the global 
critic merely has to determine whether a student’s answer to a subtraction problem 
is correct (Brown and VanLehn, 1980). Sometimes a DM has a person perform 
the global credit assignment, for example in LEAP and MACSYMA-ADVISOR. In very 
difficult domains a DM might have a person perform both global and local credit 
assignment; TEIRESIAS takes this approach when debugging MYCIN (Davis, 1982). 
TEIRESIAS can be viewed as an intelligent editor that allows an expert to perform 
global and local credit assignment while watching MYCIN solve problems. 

In domains where expertise involves heuristic problem solving, having a pro¬ 
gram perform global credit assignment is often very difficult. In a medical appren¬ 
ticeship, a student may recognize that his or her knowledge is deficient when he or 
she can no longer make sense of the sequence of questions that the physician asks 
the patient. Since a weakly plausible explanation for any sequence of questions of¬ 
ten exists, this can be very difficult to implement in a computer program. A similar 
situation exists in complex games such as chess or checkers. There is usually no way 
to know that a given move is necesseirily bad; it depends on what follows. Samuel’s 
checker player solved the global critic problem by declaring a discrepancy to exist 

^Recall from section 2 that the global critic notices that something is wrong and the local critic 
determines which part of the knowledge baise is responsible for the error. A learning program consists 
of a global and local critic and a repair component. 













whenever the expert (e.g., the book move) and the checker program recommended 
different moves at a particular board configuration. 


There are often many different changes the local critic can make to effect an 
improvement in the performance element. The selection process is usually bzised 
on which modification leads to the best improvement in the performance element. 
Selection is very much affected by how ‘improvement’ in defined. This is further 
discussed in Section 3.2. 


3.1 Performance evaluation and the synthetic agent 
method 


The synthetic agent method proposed in this paper is considerably different from 
standard performance evaluation methods in two fundamental ways. The purpose 
of the remainder of Section 3 is to explain and justify these aspects of the synthetic 
agent method. In Section 3.2 we argue that a fruitful evaluation criteria for a 
knowledge-based system should be quality of the individual knowledge elements, 
not the quality of the problem solving performance of a particular problem-solving 
program. These metrics only partially overlap and certainly conflict in the short 
term. In section 3.3, we describe how the focus of the proposed synthetic agent 
method is to delineate a performance upper bound. A performance upper bound 
describes where and under what conditions a debugging system for a problem solver 
must fail. By contrast, a standard evaluation approach aims at showing the extent 
to which a debugging system can succeed. Further, instead of characterizing the 
limits of debugging in terms of a percentage of problems that cannot be solved, the 
synthetic agent method characterizes the performance upper bound in terms of the 
knowledge representation language and the inference constructs used in the expert 
system. 












3.2 Knowledge-oriented vs. performance-oriented valida¬ 
tion 

The ultimate goal of a DM is to improve the performance of a PS or ES. The 
architecture of knowledge-based systems requires a shift in our concept of improved 
performance. We refer to the type of validation technique we advocate as knowledge- 
oriented validation and distinguish it from the traditional practice of performance- 
oriented validation. 

Performance-oriented validation requires that modifications to a particular 
problem-solving program improve problem-solving performance. Because this type 
of validation has traditionally focused on improved performance with respect to a 
single problem-solving program, the veracity of the underlying knowledge has not 
been of overriding concern. A system designed exclusively to maximize problem¬ 
solving performance of a particular problem-solving program may use a method of 
knowledge representation in which the semantics of the domain knowledge cannot 
be represented easily, if at all. A polynomial evaluation function for rating checker 
positions, for example, captures none of the meaning of its terms. 

Knowledge-oriented validation might be defined as performance-oriented vali¬ 
dation that prohibits lessening the truth of individual knowledge elements solely for 
the sake of problem-solving performance. The advent of large declarative knowledge 
bases used by multiple problem solving programs makes this perspective important. 
Examples of multiple problem-solving programs that might use the same medical 
knowledge base are programs to accomplish medical diagnosis, knowledge acquisi¬ 
tion. intelligent tutoring, and explanation. When multiple programs use the same 
declaratively-specified factual knowledge base, it is helpful to specify knowledge 
in a manner that is independent, so far as possible, of its use. Knowledge-based 
validation accomplishes this by requiring that changes to the knowledge base be 
semantically meaningful. 

Suppose we wish to be faithful to the traditional performance-oriented vali¬ 
dation paradigm when using multiple-purpose knowledge bases. This requires that 
every time a learning program finds a change to the knowledge base that will improve 











one problem-solving program, before that change can be recorded, the validation 
method must insure that the aggregate performance of all programs is improved. 
This policy will be expensive and computationally overwhelming. Further, pro¬ 
grams for all the intended uses of a knowledge base are not necessarily in existence 
at the time learning is taking place. 

Another rationale for knowledge-oriented validation is our Ixdief that perfor¬ 
mance in the long term will be more correct and robust if the knowledge structures 
are carefully developed. Moreover, when PS is a person, it is unrealistic, proba¬ 
bly even unwise, to attempt to replace semantically-rich knowledge structures with 
others that deviate radically from them merely to improve short-term performance. 

It should be noted that to some extent all programs for improving an in¬ 
telligent agent aim at both good performance and good knowledge; nevertheless, 
almost all past research in machine learning, intelligent tutoring and automatic 
programming has adopted a pure performance-oriented validation approach. This 
is especially true in automatic programming, where any mutation to the program 
to be debugged is judged to be acceptable if it causes the program to produce the 
correct output when given a correct input/output training instance (Shapiro, 1983). 

In machine learning, one of the best systems for refining an expert system 
knowledge base is the SEEK2 program for the EXPERT expert system shell (Gins¬ 
berg et al., 1985). This learning system takes a performance-oriented validation 
approach. One possible input to SEEK2 is a representative set of past solved cases 
and an initial knowledge base of rules. Given this input, SEEK2 attempts to modify 
elements of the knowledge base so as to maximize the problem-solving performance 
of the EXPERT expert system on the given representative set of solved problem cases. 
In EXPERT, the strengths of inexact rules in the knowledge base are represented us¬ 
ing certainty factors (CFs). Examples of modification operators used by SEEK2 to 
improve performance are LOWER-CF and RAISE^CF (Ginsberg, 1986). When a repre¬ 
sentative set of past cases is present, the strengths of inexact rules are determined, 
since certainty factors can be given a strict probabilistic interpretation (Heckerman, 
1986). We strongly believe that an arbitrary change to the strength of a rule just 
to improv’c performance is unjustifiable and unnecessary (Wilkins and Buchanan, 







19S6). The cost of this improved performance is a knowledge base that may con¬ 
tain incorrect knowledge. The SEEK2 refinement approach is not an instance of 
knowledge-oriented learning; it does not use knowledge-oriented validation. 

A good example of knowledge-oriented learning is repair theory in the domain 
of subtraction problems (Brown and VanLehn, 1980). Repair theory is concerned 
with detecting underlying bugs, given the observable problem solving behavior of 
students. Repair theory has a procedural model of problem solving that claims to be 
a plausible model of the associated human skill; bugs of students are correlated with 
possible bugs in the problem-solving procedure for subtraction. Repair theory is 
similar in spirit to the synthetic agent method we propose for assessing a differential 
modeling system. Repair theory generates most of the significant possible bugs by 
deleting parts of the procedural knowledge; likewise we expect our approach to 
generate most of the significant possible types of bugs in the declarative domedn 
knowledge base, mainly by deleting parts of the knowledge base, as we shall describe 
in Section 4. The main difference is that in the repair theory model of subtraction 
the PS and ES knowledge is almost completely procedural, whereas we are interested 
in factual knowledge is declaratively represented. 

3.3 Capability-oriented vs. limitation-oriented validation 

A typical way of validating that a DM improves an ES involves using a disjoint set 
of validation and training problem sets. The ES solves the validation problem set 
and its performance is recorded. Then the DM improves the ES while watching 
a human expert PS solve a training problem set. Finally ES solves the validation 
problems again; the amount of improvement in performance provides a measure of 
the quality of the DM. 

This scenario establishes a lower bound on the quality of a DM. By increasing 
the size of the training problem set, DM might improve ES even more. We refer to 
validation methods that establish a lower bound on the quality of a DM as capability- 
oriented. For a given set of training and validation problems, capability-oriented 
validation shows that the DM is responsible for a more capable ES. 











Another method of validating a DM is to have the DM watch a student solve a 
training problem set. Let us assume that the student exhibits a representative set of 
the types of domain knowledge errors that could be made in the problem domain. 
A domain expert can manually identify the domain knowledge errors connected 
with each training problem. This manual analysis provides a performamce upper 
bound with respect to this training set for the DM, and the DM modeling program 
is measured against this standard. The goals of this type of manual analysis and 
our proposed automated analysis using the synthetic agent method are identical, in 
the case where the student and the problem set have been both constructed so as 
to allow all possible types of domain knowledge errors to be made. 

We desire to know those types of differences in an expert system knowledge 
base that cannot be detected or corrected via differential modeling. In contrast 
to the capability-oriented approach, our validation approach aims at determining 
when the differential modeler must fail — we are Imitation-oriented. For example, 
a limit of a program for inducing LISP functions from examples might be that the 
program can’t induce cases that require certain types of loop constructs. In our 
work, we have focused on showing certain conditions that force the differential 
modeling approach to fail under the most favorable of conditions, the single fault 
assumption. The multiple fault assumption would allow determination of a broader 
performance upper bound. 

4 Synthetic Agent Method of Validation 

The apprenticeship learning and tutoring scenarios shown in Figures 2 and 3 involve 
two agents: a person and an expert system. The person serves as an expert and stu¬ 
dent, in the context of apprenticeship learning and intelligent tutoring, respectively. 
The synthetic agent method consists of replacing the person with a synthetic agent, 
which is another expert system, in order to experiment with and validate the dif¬ 
ferential modeling system objectively. The knowledge in the synthetic agent expert 
system is modified to be slightly different from the knowledge in the original ex¬ 
pert system. The knowledge is modified to be slightly ‘better’ in an apprenticeship 







learning scenario and slightly ‘worse’ in an intelligent tutoring scenario. 


Problem Statement 



Answer Differences Between Answer 

Knowledge Structures 
of PS and ES 


Figure 3: Apprentice learning scenario: Apprentice expert system 
watches human expert through the differential modeling program, 
with the goal of improving the apprentice program’s knowledge. 


An advantage of the synthetic agent method is control over interpersonal vari¬ 
ables involved in differential modeling. An example of an interpersonal variable is 
the problem-solving style of a PS, exemplified by the set of strategic diagnostic op¬ 
erators used by the PS. Diagnostic operators specify the permissible task procedures 
that can be applied to a problem as well as the allowable methods for achieving the 
task procedures. Examples of problem-solving operators in the domain of diagnosis 
include: ask general questions, ask clarifying questions, refine hypotheses, differen¬ 
tiate between hypotheses, and test hypothesis. Another interpersonal variable is 
the quantity of domain-specific knowledge that the PS possesses. 


While control of interpersonal variables almost always leads to an incorrect 
DM performance lower bound, conclusions reached concerning a performance upper 
bound are sound when interpersonal variables are controlled. If a system is inher¬ 
ently limited under the most optimal assumptions possible for differential modeling, 
it will still be inherently limited in those settings that involve a less optimal differ¬ 
ential modeling setting. 









Problem Statement 


DM 

Differential 

Modeler 


Answer Differences Between Answer 

Knowledge Structures 
of PS and ES 

Figure 4: Intelligent tutoring scenario; Expert system watches 
student through the differential modeling program, with the goal 
of improving the student’s knowledge. 

In the learning and tutoring scenarios, the synthetic agent method treats the 
original expert system knowledge base as a “gold standard”. The apprentice ES 
and the student PS always have a deficiency with respect to this gold standard. In 
this paper we restrict our analysis to the situation where the apprentice’s knowledge 
differs from the gold standard by a single element of knowledge; hence we have a 
single fault assumption. Two types of knowledge base discrepancies are possible: 
missing knowledge and erroneous knowledge. The synthetic agent method proce¬ 
dure described in section 4.1 shows how deletion of knowledge can represent the 
space of missing and erroneous knowledge. Other methods for creating erroneous 
knowledge are described in section 4.3. 

For a given problem statement, a distinction is made between referenced, ob¬ 
servable, and essential knowledge in the ES’s knowledge base. The relation between 
these categories is illustrated in Figure 4. Referenced knowledge is simply knowledge 
that is accessed during a problem solving case. Observable knowledge is knowledge 
whose removal leads to different external observable behavior of a PS, either in the 
sequence of actions that the PS exhibits or the final answer. Essential knowledge is 


PS 

Student 


ES 

Expert System 










Figure 5: The relation between different categories of knowledge, 
with respect to a particular problem case. 

knowledge whose removal leads to a significantly different final answer. 

Of most concern is the apprentice’s ability to acquire the essential knowledge 
elements connected with a problem statement. These are the relations most im¬ 
portant for solving a given case. For plausible reasoning systems, what comprises a 
significantly different answer needs to be specified. For insteince, if there are multi¬ 
ple diagnoses, the significance of the order in which the hypotheses are ranked needs 
to be determined. Acquisition of elements that are observable but not essential are 
also of interest, since they can be essential elements with respect to another problem 
statement. 

The procedure for calculating a performance upper bound on a differential 
modeling system is now presented. 












4.1 The synthetic agent method 


Step 1. Create synthetic agent. Replace PS with a synthetic agent: a copy of ES 
with initially the same domain knowledge. 

Step 2. Solve problem case. Solve a problem using PS and save the solution trace, 
i.e., the observable actions of PS and the final answer. 

Step 3. Identify observable knowledge. For a particular problem case, collect all 
elements in the knowledge base that were referenced by PS during problem 
solving. Identify the observable knowledge: the subset of the referenced knowl¬ 
edge whose removal would lead to a different solution trace or a different final 
answer. 

Step 4. For each observable knowledge element: 

Step 4a. Remove the element from ES. In an apprenticeship learning sce¬ 
nario this creates an apprentice expert ES with missing knowledge. In 
an intelligent tutoring scenario the element removed from the ES is de¬ 
clared to be erroneous^. Since the element is still present in PS, the 
synthetic student PS has erroneous knowledge. 

Step 4b. Detect and localize knowledge discrepancy. Have the PS solve the 
problem case. See if DM can detect (the global critic problem) and 
localize (the local critic problem) the knowledge difference. 

Step 5. For each observable knowledge element: 

Step 5a. Remove the element from PS. In an intelligent tutoring scenario 
this creates a synthetic student PS with missing knowledge. In an appren¬ 
ticeship learning scenario the element removed from the PS is declared 
to be erroneous^. Since the element is still present in ES, the apprentice 
expert ES has erroneous knowledge. 

Step 5b. Detect and localize knowledge discrepancy. Have the PS solve the 
problem ca.se. See if DM can detect (the global critic problem) and 
localize (the local critic problem) the knowledge difference. 

^N.D, This fclempnt of knowledge is treated as erroneous for purposes of validation. In reality, 
the element is true know'ledge. 







4.2 Discussion of synthetic agent method 


All ('xjx'i't systt'iu's ('xplaiiatioii facility can he helpful in locating the observable 
knowledge with respect to a given problem case. One of the hallmarks of a good 
expert system is its ability to explain its own reasoning. So it is not too much to 
ask for those jnecc's of knowledge used on a problem case, and a good explanation 
system might (>V('n be tibh' to itlentify the essential knowledge. At worst, given 
the pieces of knowledge that were used to solve a particular problem, the essential 
pieces of knowledge can be determined by experimentation. Usually, only a smal. 
amount of an exiiert system's domain knowledge is observable with respect to a 
given problem, and our e.xperiences in the medical diagnosis domain have shown us 
that only a small amount of the observable knowh'dge is essential knowledge. 

SoTiu' knowledge that is referenced by the expert system may not have ob- 
st'fvabh' consc'(iuenc('s. even if it is used by the problem solver, since the removal 
of knowledge does not always effect the external behavior of a problem solver. For 
instance, in M\'C1N and .N’EOMYCIN, terms that represent medical symptoms and 
measurements, such as patient weight, have an ASKFIRST property. The expert 
system us('s the value of this property to decide whether the value of a variable is 
first determined by asking the user or first determined by derivation by some other 
method, such as from first principles. However, if the system does not possess tech¬ 
niques for deriving the information from other principles, then the external behavior 
of the system is the same regardless of the value of the ASKFIRST property. 

When testing the global critic in steps 4b and 5b of the synthetic agent 
method, part of the assessment must relate to whether the apprentice detects 
knowledge bas(’ diffc'rences close to the point in the problem-.solving session where 
the different knowledge was u.sed. This temporal proximity is important, since the 
problem-solving context at this point in the problem-solving session strongly focuses 
the search for missing or erroneous knowledge. 














4.3 Categories of errors 


The knowledge organization that we focus upon specifies all factual domain knowl¬ 
edge in a declarative fashion. In such a knowledge base, there are two main cat¬ 
egories of errors; missing and erroneous knowledge. Missing knowledge is absent 
from the knowledge base, and erroneous knowledge is factually incorrect knowledge 
that is present in the knowledge base. 

The space of missing knowledge is easy to generate, especially with the single 
fault assumption. Recall that the original expert system serves as our gold stan¬ 
dard and the domain knowledge in the expert system is declaratively represented. 
Hence, the number of single faults from missing knowledge is equal to the number 
of elements in the declarative knowledge base. 

The space of erroneous knowledge is much more difficult to describe. The 
synthetic agent method takes a novel approach to the problem in steps 5a and 
6a. An erroneous element is created by declaring a correct knowledge element to 
be erroneous for purposes of validation. We are also considering other approaches. 
Much of the knowledge is represented declaratively and typed. Therefore, erroneous 
knowledge can be generated by substituting different values for the knowledge in 
the range of the type, as long as the assumption can be made that the erroneous 
knowledge is at least correctly typed by the problem solver. The space of possible 
variations of declarative associational rule knowledge is significantly reduced by the 
practice used in the HERACLES’ expert system shell of factoring different types of 
knowledge from the domain knowledge, such as causal, definitional, and control 
knowledge (Clancey, lOSGb). 


5 Application of Synthetic Expert Method 


Our investigations of a performance upper bound for a differential modeler are be¬ 
ing performed in the context of the HERACLES and ODYSSEUS systems. HERACLES 
is an expert system shell that solves classification-type probhans using the heuris- 








tic classification method (Clancey, 1985). The ODYSSEUS program differentially 
models a PS against any ES implemented using the HERACLES expert system shell 
(Wilkins, 1986). When PS is a human expert, ODYSSEUS functions as a knowledge 
acquisition program for the HERACLES expert system shell. When PS is a student, 
ODYSSEUS functions as a student modeling program for the GUID0N2 intelligent 
tutoring system, which is built over HERACLES. 


Problem Statement 



Differences Between 
Knowledge Structures 
of PS and ES 


Figure 6; Synthetic agent validation situation for apprenticeship 
learning in which the role of the PS has been filled by a synthetic 
expert system. In apprenticeship learning, the DM watches PS to 
improve ES’s knowledge structures. 

In HERACLES, domain knowledge is encoded using a relational language and 
MYCIN-type rules (Clancey, 1986b). The knowledge relations of the relational lan¬ 
guage are predicate calculus representations of the domain knowledge, written using 












the logic programming language MRS. For example, an instantiation of the propo¬ 
sition (SUGGESTS $PARM $HYP) represents the fact that if a particular parameter 
is true then this suggests that a particular hypothesis is true. An instantiation of 


the template (ASKFIRST SFINDING $FLAG-VALUE) specifies whether the system 
should first eisk the user for the value of a finding, or derive the information from 
existing information. The major domain knowledge base for HERACLES at this time 
is the NEOMYCIN knowledge base for diagnosing meningitis and neurological prob¬ 
lems (Clancey, 1984). A second effort in the sand Ccisting domain is called CASTER 
(Thompson and Clancey, 1986). 

Three aspects of the HERACLES expert system shell facilitate the task of 
differential modeling faced by ODYSSEUS. First, distinctions are made between the 
different types of knowledge in HERACLES’ knowledge base, such as heuristic, def¬ 
initional, causal, and control knowledge. Second, the method of reasoning, called 
hypothesis-directed reasoning, approximates that used by human experts (Clancey, 
1984). Hence, HERACLES can be viewed as a simulation of an expert’s process of 
diagnosis. Third, the control knowledge is explicitly represented as a procedural 
network of subroutines and metarules that are both free of domain knowledge; the 
subroutines and metarules use variables rather than specific domain terms (Clancey, 
1986b). By contrast, the heuristic rules in MYCIN have a great deal of control knowl¬ 
edge imbedded in the premises of the rules (Clancey, 1983; Buchanan and Shortliffe, 
1984). 

Figure 5 shows the place of the ODYSSEUS DM in the context of debugging an 
apprentice expert system. The DM tracks the problem-solving actions of the PS step 
by step. For each observable step of the problem solver, ODYSSEUS generates and 
scores the alternative lines of reasoning that can explain the reasoning step. If the 
global critic does not find any plausible reasoning path, or all found paths have a low 
plausibility, ODYSSEUS assumes that there is a difference in knowledge between the 
human problem solver and the expert system. The local critic attempts to locate the 
knowledge difference either automatically or by asking the expert specific questions. 
ODYSSEUS’ analysis of problem-solving steps uses two underlying domain theories: a 
strategy theory of the problem-solving method called hypothesis-directed reasoning 
using the heuristic classification method, and an inductive predictive theory for 
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hcviristic rules that uses a library of jireviously solved problem cases. 


The ODYSSEUS global and local critics are themselves Ijeing implemented 
as t\v(i HEFlACLES-based expert systems. There are three la.-asoiis why we choose to 
implement the critics as exj)ert systems. First, the task that confronts the learning 
critics is a knowledge-intensive task (Dietterich and Buchanan, 19S1), and expert 
system techniques are usefid for representing large amounts of knowledge. Second, 
with an expert system architecture, the reasoning method used by the critics can 
b(' made’ explicit and easily evaluated, since the domain knowledge is dcclaratively 
encoded using HERACLES' knowledge relations and simple heuristic rules. Third, 
since OD\'S.SEUS is designed to improve any HERACLES-based expert system, it can 
rheoretically improve itself in an apprcnticeshi[) learning setting. 

Ai)proximately sixty different knowledge relations in HERACLES specify the 
declarative' domain knowledge. It would be useful to know how successful the global 
and local critics are at detecting discrepancies in the different knowledge relations 
(jf the knowledge representation language. Are there certain types of knowledge re¬ 
lations whose absence is always noticeable? Are there particular types of knowledge 
whose absence i.s very hard to recognize? For example, HERACLES represents final 
diagnoses in a hierarchical tree structure; determining that a problem is caused by 
a missing link in this structure may be very difficult for the apprentice to discover. 
By contrast, it may be very easy to discover whether a trigger property of a rule is 
missing. A trigger property causes the conclusion of a rule to treated as an active 
hypothesis if particular clauses of the rule premise are satisfied. Clearly global and 
local credit assignment are greatly affected by the complexity of the procedural 
control knowledge used in the expert system shell. 


6 Summary 


With the proliferation of expert systems, methods of intelligent tutoring and ap- 
[)renticeship learning that are ba.sed on differential modeling of the normal problem¬ 
solving behavior of a student or expert against a knowledge-intensive expert system 















should become increasingly common. The synthetic agent method is proposed as 
an objective means of assessing the limits of a particular differential modeling pro¬ 
gram in the context of intelligent tutoring and apprenticeship learning. The power 
of a differential modeler is crucially dependent upon the expert system’s method 
of knowledge representation and control. The synthetic agent method provides a 
means of cxjircssing the limitations of a differential modeler in terms of the knowl¬ 
edge representation and control vocabulary. 

The synthetic agent method involves a systematic perturbation of a program 
that takes the place of the student or expert. Traditionally, methods of evaluating 
a differential modeler have focused on a performance lower bound. The described 
synthetic agent method focuses on establishing a performance upper bound. It 
provides a means of exploring the extent that a differential modeling system is able 
to detect and isolate an arbitrary difference between a knowledge base of an expert 
system and the problem-solving knowledge of a indent or expert. Our work to 
date confirms our belief that the task of differential modeling is easier the more an 
expert system represents factual domain knowledge in a declarative fashion. 

The validation framework described in this paper is being used to assess 
the limits of the ODYSSEUS modeling program in the context of intelligent tutoring 
and apprenticeship learning. Students and experts are being differentially modeled 
against knowledge bases for the HERACLES’ expert system shell. This should lead 
to a better understanding of the synthetic agent method, the ODYSSEUS modeling 
program, and the extent to which HERACLES’ method of knowledge representation 
and control facilitates differential modeling. 
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